Crowdsourcing Evaluations of Classifier Interpretability

نویسندگان

Amanda Hutton

Alexander Liu

Cheryl E. Martin

چکیده

This paper presents work using crowdsourcing to assess explanations for supervised text classification. In this paper, an explanation is defined to be a set of words from the input text that a classifier or human believes to be most useful for making a classification decision. We compared two types of explanations for classification decisions: human-generated and computer-generated. The comparison is based on whether the type of the explanation was identifiable and on which type of explanation was preferred. Crowdsourcing was used to collect two types of data for these experiments. First, human-generated explanations were collected by having users select an appropriate category for a piece of text and highlight words that best support this category. Second, users were asked to compare humanand computer-generated explanations and indicate which they preferred and why. The crowdsourced data used for this paper was collected primarily via Amazon’s Mechanical Turk, using several quality control methods. We found that in one test corpus, the two explanation types were virtually indistinguishable, and that participants did not have a significant preference for one type over another. For another corpus, the explanations were slightly more distinguishable, and participants preferred the computer-generated explanations at a small, but statistically significant, level. We conclude that computer-generated explanations for text classification can be comparable in quality to human-generated explanations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Measure for Coherence in Statistical Topic Models

Big data presents new challenges for understanding large text corpora. Topic modeling algorithms help understand the underlying patterns, or “topics”, in data. Researchersauthor often read these topics in order to gain an understanding of the underlying corpus. It is important to evaluate the interpretability of these automatically generated topics. Methods have previously been designed to use ...

متن کامل

Crowdsourcing evaluation of high dynamic range image compression

Crowdsourcing is becoming a popular cost effective alternative to lab-based evaluations for subjective quality assessment. However, crowd-based evaluations are constrained by the limited availability of display devices used by typical online workers, which makes the evaluation of high dynamic range (HDR) content a challenging task. In this paper, we investigate the feasibility of using low dyna...

متن کامل

Leveraging non-expert crowdsourcing workers for improper task detection in crowdsourcing marketplaces

Controlling the quality of tasks, i.e., propriety of posted jobs, is a major challenge in crowdsourcing marketplaces. Most existing crowdsourcing services prohibit requesters from posting illegal or objectionable tasks. Operators in marketplaces have to monitor tasks continuously to find such improper ones; however, it is very expensive to manually investigate each task. In this paper, we prese...

متن کامل

Crowdsourcing for Evaluating Machine Translation Quality

The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to profess...

متن کامل

Leveraging Crowdsourcing to Detect Improper Tasks in Crowdsourcing Marketplaces

Controlling the quality of tasks is a major challenge in crowdsourcing marketplaces. Most of the existing crowdsourcing services prohibit requesters from posting illegal or objectionable tasks. Operators in the marketplaces have to monitor the tasks continuously to find such improper tasks; however, it is too expensive to manually investigate each task. In this paper, we present the reports of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Crowdsourcing Evaluations of Classifier Interpretability

نویسندگان

چکیده

منابع مشابه

A Novel Measure for Coherence in Statistical Topic Models

Crowdsourcing evaluation of high dynamic range image compression

Leveraging non-expert crowdsourcing workers for improper task detection in crowdsourcing marketplaces

Crowdsourcing for Evaluating Machine Translation Quality

Leveraging Crowdsourcing to Detect Improper Tasks in Crowdsourcing Marketplaces

عنوان ژورنال:

اشتراک گذاری